CSR Data Collection Pilot

نویسندگان

  • Denise Danielson
  • Jared Bernstein
چکیده

The objective of the CSR Corpus Development is to collect and deliver a large corpus of continuous speech data to support DARPA research efforts in continuous speech recognition (CSR). The CSR corpus is intended to be task independent and to consist of speech that is similar to that which would be expected from eventual users of real world CSR systems. Toward these ends, the current pilot collection effort was designed to minimize controls on vocabulary, background noise, and microphones. CSR is also designed to incorporate spontaneous speech.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spontaneous Speech Collection for the CSR Corpus

As part of a pilot data collection for DARPA's Continuous Speech Recognition (CSR) speech corpus, SRI International experimented with the collection of spontaneous speeoh material. The bulk of the CSR pilot data was read versions of news articles from the Wall Street Journal (WSJ), and the spontaneous sentences were to be similar material, but spontaneously dictated. In the first pilot portion ...

متن کامل

NIST-DARPA Interagency Agreement: Spoken Language Program

1. To coordinate the design, development and distribution of speech and natural language corpora for the DARPA Spoken Language research community. 2. To design, coordinate implementation, and analyze results, of performance assessment "benchmark tests" for DARPA's speech recognition and spoken language understanding systems. 1. Completed production of the six-CD-ROM-set for ATIS0, and made this...

متن کامل

Collection and Analyses of WSJ-CSR Data at MIT

Recently, the DARPA community started a new data collection initiative in the Wall Street Journal (WSJ) domain to support research and development of very large vocabulary continuous speech recognition (CSR) systems. Since August 1991, our group has actively participated in the development of the WSJ-CSR corpus. The purpose of this paper is to document our involvement in this process, from reco...

متن کامل

CSR Data Collection

The CSR Development and Evaluation Spokes data collection task yielded 4435 development test utterances fr~n 30 speakers and 4878 evaluation test utterances from a different set of 30 speakers. The development test data covered eight different spoke conditions, each of which had its own distinct combination of subject, prompt text, microphone and recording environment requirements. Similarly, t...

متن کامل

CSR Corpus Development

The CSR (Connected Speech Recognition) corpus represents a new DARPA speech recognition technology development initiative to advance the state of the art in CSR. This corpus essentially supersedes the now old Resource Management (RM) corpus that has fueled DARPA speech recognition technology development for the past 5 years. The new CSR corpus supports research on major new problems including u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992